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Abstract 


The  paper  considers  the  effect  of  roundoff  error  in  the  solution  pro- 
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cess  of  equilibrium  FEM  equations  The  use  of  the  equivalent  perturbation 


matrix,  the  misuse  of  scaling,  and  the  estimation  of  the  condition  number 
are  discussed. 
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1 . Introduction 

The  Finite  Element  Method  (FEM)  is  a successful  blend  of  clever  engi- 
neering, fine  mathematics,  and  ingenious  programming.  When  put  to  use,  the 
product  seems  to  be  slightly  tarnished  by  roundoff  errors.  It  would  be 
nice  to  overlook  them. 

Roundoff  afflicts  (1)  the  numerical  integration  used  to  compute  the 
element  stiffness  matrices,  (2)  the  assemblage  of  the  global  stiffness 
matrix,  (3)  the  load,  or  force,  vector  f which  forms  the  right  hand  side 
of  the  familiar  equation  of  virtual  work,  (4)  the  numerical  solution  of 
the  equation  to  obtain  a vector  u.  Consequently  this  u will  not  repre- 
sent the  approximate  displacement  of  FEM  theory  for  which  all  those  nice 
error  bounds  have  been  proved. 

However  the  same  warning  applies  to  the  exact  solution  of  the  system 
of  equations  delivered  by  the  exact  evaluation  of  the  numerical  integration 
formulas.  Indeed  these  formulas  usually  make  a greater  change  in  the  larger 
elements  of  the  true  stiffness  matrix  than  do  all  the  sources  of  roundoff 
in  (4)  put  together.  The  big  difference  is  that  the  changes  in  the  matrix 
elements  due  to  numerical  integration  are  highly  correlated  whilst  those 
due  to  roundoff  in  (4)  are  not. 

At  present  the  favorite  method  for  solving  the  system  of  linear  equa- 
tions generated  by  the  FEM  is  a refined  version  of  the  familiar  elimination 
process.  As  a result  of  twenty-five  years  of  Intensive  study  the  problem  of 
the  influence  of  roundoff  error  is  well  understood.  The  main  facts  will 
be  presented  here,  with  emphasis  on  the  positive  definite  systems  that  occur 
in  equilibrium  problems. 
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2.  Basic  Misconceptions  About  Gaussian  Elimination 
The  remarks  in  this  section  are  not  confined  to  finite  element  problems, 


MISCONCEPTION  1.  The  tiny  roundoff  errors  which  occur  in  each  of 
the  basic  arithmetic  operations  required  to  solve  systems  with 
hundreds  of  unknowns  by  Gaussian  elimination  may  accumulate  enouqh 
to  spoil  the  results  completely. 


This  fear  was  widespread  in  the  late  1940's. 

REPLY.  When  roundoff  does  lead  to  unacceptable  error  in  the 
solution  this  misfortune  is  attributable  to  a mere  handful 
(perhaps  only  one)  of  the  errors,  those  which  happen  to  occur 
at  sensitive  places.  This  is  in  contrast  to  the  propagated  error 
which  afflicts  initial  value  problems  for  differential  equations 
where  accumulation  does  occur. 


The  standard  example  is  the  Hilbert  segment  (h. . = l/(i+j-l))  which  is 

' J 

discussed  in  detail  in  [4].  The  surprising  fact  is  that  the  initial  errors 
made  in  rounding  off  1/3,  1/6,  1/7,  etc.  to  single  precision  do  more  damage 
to  the  solution  than  all  the  errors  committed  during  the  subsequent  triangular 
factorization  and  backsolving.  This  result  is  typical  for  graded  positive 
definite  matrices.  A matrix  is  graded  if  its  elements  decline  (or  do  not 
increase)  as  their  indices  increase, 

I a 1 j I i |ak4|  if  i 1 k,  j > «.  . 

MISCONCEPTION  2.  In  Gaussian  Elimination  small  pivots  cause 
large  errors  in  the  computed  triangular  factors. 
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REPLY.  The  actual  size  of  a pivot  is  irrelevant.  A small  one  may 
or  may  not  accompany  large  errors.  By  scaling  perversely  any 
pivot  can  be  made  to  look  small.  An  initial  scaling  of  a posi- 
tive definite  matrix  so  that  all  diagonal  elements  are  unity  does 
not  prevent  the  occurrence  of  completely  harmless  small  pivots. 

For  example. 
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MISCONCEPTION  3.  Large  multipliers  in  Gaussian  Elimination  pro- 
voke large  errors  in  the  factorization. 

REPLY.  As  in  No.  2 the  multipliers  are  not  the  relevant  quanti- 
ties. For  arbitrary  matrices  large  multipliers  may  or  may  not 
accompany  large  errors  in  the  computed  triangular  factors.  How- 
ever for  symmetric  positive  definite  matrices  large  multipliers 
cannot  be  blamed  for  large  errors  because  in  any  subsequent  com- 
putation the  large  multiplier  is  always  multiplied  by  other 
elements  so  that  the  product  is  less  than  the  corresponding 
diagonal  element.  For  example, 
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3.  The  Equivalent  Perturbation  Matrix  (EPM) 

Let  us  consider  the  standard  Gaussian  elimination  process  when  applied 


tn  a sparse  positive  definite  n * n matrix  K.  Some  previous  experience 
with  the  matrix  interpretation  of  elimination  is  assumed. 
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From  the  original  K = the  algorithm  implicitly  derives  a sequence  of 

reduced  matrices  K^,  j = 2,...,n,  where  is  of  order  n+l-j,  is  also 

positive  definite,  and  is  written  over  K^"^.  Upon  completion  the  array  K 
holds  an  upper  triangular  matrix  U which  is  related  to  the  by 

(1)  u.j  = • for  j = i,i+l,...,n  . 

The  rows  of  run  from  i to  n to  conform  to  their  actual  positions 

in  the  array  K.  The  multipliers  may  have  been  discarded  but,  in  any  case 
they  are  the  nontrivial  elements  of  a unit  («,„  = 1)  lower  triangular  matrix 
L.  In  exact  arithmetic 

(2)  U = DLT  . 

Because  of  roundoff  errors  LU  (the  exact  product)  does  not  equal  K and 
the  so-called  equivalent  perturbation  matrix  E is  defined  by 

(3)  K-  E = LU  . 

Note  that  the  exact  triangular  factors  of  K are  completely  ignored.  E 
can  be  computed  but  all  we  really  want  to  know  is  that  E is,  in  some  sense, 
small  compared  to  K. 

It  would  be  nice  to  show  that  |e../k..|  is  tiny  for  all  i,  j but 

1 J 1 J 

experience  with  "fill  in"  shows  that  this  is  not  true.  Interestingly  enouqh 
it  is  true  for  the  Hilbert  segment  referred  to  in  an  earlier  section.  It 
will  be  shown  later  that  le^./k.^l  is  always  tiny  for  FEM  problems. 

It  can  happen  that  roundoff  destroys  positive  definiteness  (i.e.,  K-E 
is  not  positive  definite)  but  this  property  is  so  desirable  that  many  codes 
will  make  small  perturbations  in  diagonal  elements  of  K to  ensure  that 
every  element  of  D,  fho  diagonal  part,  of  U,  is  actually  positive.  We 
shall  assume  that  D is  positive  in  this  paper. 


- - — 
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In  exact  arithmetic  the  reduced  matrices  are  related  as  follows: 


,(j) 


partition  KVJ'  as  indicated,  6.  is  its  (1,1)  element, 

J 


(4) 


,(j)  - 


“I 


u.  M. 
3 3 


KU+1)  = M . . 

J J J J 


It  follows  from  (4)  that,  in  exact  arithmetic,  for  i = j+l,...,n 


(5) 


k(j+1)  < k{.V 


u _ -11  (equality  only  if  u^  = 0)  . 


Under  the  assumption  that  D remains  positive  (5)  will  also  hold  in  practice. 
It  is  this  monotonic  decrease  in  each  diagonal  element  which  makes  it  unneces- 
sary to  rearrange  rows  and  columns.  The  argument  is  worth  reviewing.  The 
dreaded  element  growth  is  defined  as 


(6) 


% = l™x,  •‘I?/-*  kie>l  • 

a, 8 


For  positive  definite  matrices 


(7) 

and  so,  by  (5), 

(8) 


g„  * (max  k(T  /max  k 1 < \ 
n „ ,•  n aaJ  — 

m,i  a 


Note  that  one  permutation  of  K (to  P^KP)  might  yield  a smaller  E (term 
by  term)  than  another  but  all  permutations  produce  equally  satisfactory  E's 
in  the  sense  that  REI/IAI  is  tiny.  This  possibility  has  not  received  much 
attention  because,  when  used  at  all,  permutations  are  selected  for  the  more 
important  goal  of  keeping  down  the  number  of  nonzero  elements  as  the  reduc- 
tion proceeds. 
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As  long  as  no  overflow  occurs  the  elements  of  L may  be  arbitrarily 
large:  it  is  the  size  of  the  kj1^,  m = 1 . . ,min(i ,j) , which  affect  e^ 

as  we  now  show. 


The  History  of  the  (i,j)  Element,  i < j. 

A typical  element  k. . will  be  modified  in  only  a few  of  the  n-1  steps 

» J 

of  the  reduction.  Let  the  mth  step  be  one  of  them.  First  the  multiplier 

£.  , m < i,  is  calculated.  The  division  k^/k^  will  not  be  done 
lm  mi  mm 

correctly  but  the  error  affects  e-m  and  not  e^..  The  key  calculation 
replaces  by  ki^^  which  satisfies 


(9) 


k(m+l ) _ .(m) 
Kij  Kij 


l k(m) 

Sm  mj 


e(m) 

ij 


where  ej^  is  the  error  incurred  in  the  multiplication  and  the  subtraction. 
In  order  to  relate  u..  (=  kj1.^)  to  k..  one  writes  down  (9)  for  m = 1,2,.. 
and  adds: 


(10) 


k(2> 

U 

k(3) 

ij 


. k?!> 

Kij 

- k!2) 

ij 


t k .e(1) 

ilKlj  ei j ’ 

£ k(2)  - e(2) 
i'2K2j  l j 


k(i) . k(i-D  . kd-n  e( i-n 

kij  ~ kij  " *i,i-l  i-l,j  ij 


i-1 


Cancelling  Y k!1^  from  each  side  yields 
m=2  1J 


(ID 


k(1)  _ kO)  V,  k(rn) 

ij  Kij  im  mj 


7e(m) 

ieij 


The  crucial  observation  is  that  since  kj^  * u^  (11)  can  be  rewritten  in 
the  illuminating  form 
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(12) 


(LU)ij 


k . . - e . . , 

ij  U 


e..  = y e!m) 
m '3 


H 


This  is  equivalent  to  equation  (3)  which  defined  E.  It  shows  that  e..  is 

* 3 

simply  the  sum  of  the  errors  made  in  modifying  the  (i,j)  element  of  K. 


Note  that  e!m^ 
* 3 


0 if  either  ^ n = 


mi 


3 or  k . = 0. 

mj 


For  sparse 


matrices  the  sum  over  m contains  few  nonzero  terms  compared  with  the 
order  n. 

It  remains  to  consider  ej^  and  the  actual  roundoff  errors  made  in 
the  calculation.  A careful  analysis  is  given  in  [4,  p.  100]  and  it  is  not 
relevant  to  our  purposes  to  reproduce  that  material.  Instead  of  the  true 
product  The  algorithm  computes  fj^  satisfying 


(13) 


f<*>  - 1.  u . + u . . 

ij  lm  mj  ij  im  mj 


Instead  of  the  difference  k(^  - fj^  the  algorithm  computes 
(14) 


,(m) 


(.(m+l)  _ .(m)  f(m)  (m),(m+l) 
ij  ' kij  ' fij  ‘ “ij  kij 

(m) 


where  and  otlj  are  complicated  but  satisfy  the  simple  bounds 


(15) 


i^i- 


where  e is  the  relative  precision  of  the  machine;  for  current  computations 
-14 

e _<  10  . In  passing  we  must  acknowledoe  that  (14)  does  not  hold  for  all 

North  American  computers;  the  appropriate  modifications  complicate  the 
analysis  without  changing  its  essential  features.  On  substituting  (13) 
into  (14)  and  comparing  with  (9)  the  desired  expression  is  obtained 


(16) 


e!1?1  - . * e,<*>k<iw') 

ij  ij  ij  mj  ij  ij 


Equations  (12)  and  (16)  give  an  exact  formula  for  e^,  no  approximations 
have  been  made.  If  k^+1^  is  huge  compared  to  k. .,  say  lO1^  k..,  then 

• J I J ' J 
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e\.'  will  (except  on  rare  occasions)  be  as  big  as  k... 

• J I J 

Before  proceeding  we  note: 

Case  1:  fjj*  = 0.  Then  ej^  = 0. 

Case  2:  kj1^  = 0,  fill-in.  Then  ct^  = 0. 

Case  3:  Exponent(kj^)  = Exponent  (f^).  cancellation.  Then  ot^  = 0. 

From  (16),  (12)  and  (15)  comes  a reasonable  bound  on  e..,  i < j, 

" J 

namely 


(17) 


1 J 1 


i-1 

I 

m 


£•  u . 
im  mj 


« 1 J 


m 


where  the  sum  is  over  all  m not  corresponding  to  Case  1.  If  K has  band- 
width 2w+l  then  the  sum  has  at  most  w-j+i  nonzero  terms.  The  matrix  E 
is  not  quite  symmetric  because  of  the  final  division  in  = u.^./u^"; 

it  is  necessary  to  add  e|u..|  to  the  bound  in  (17)  to  account  for  it, 

' J 

yielding 


(18) 


Many  inferences  can  be  drawn  from  (17)  and  (13).  This  is  done  in  the 
next  section. 

Roundoff  error  also  afflicts  the  forward  elimination  and  the  backsubsti 
tution  required  to  complete  the  solution  of  the  equations.  However  these 
errors  are  dominated  by  those  in  the  triangular  factorization.  To  incor- 
porate them  into  the  analysis  one  writes 

(L  ♦ iM  )y  - t ♦ .st  , 

(19) 

(U  + 6U)u  = y 
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1 

where  6L  and  611  have  the  desirable  property  that  they  are  tiny  compared 
to  L and  U,  element  to  element.  Consequently 

(20)  [K-E  + (6L)U  + L(6U)  + (6L)(6U)]u  = f + 5f  . 

Having  said  this  we  shall  confine  our  remarks  to  E. 

To  sum  up 


The  effect  of  all  the  roundoff  errors  in  the  direct  solu- 
tion of  Kv  = f is  to  produce  a vector  u which  satisfies  the 
equation 

(K  + 6K)u  = f+5f 

with  6K,  6f  given  by  (20),  (19),  (16),  and  (12).  * 


The  advantage  of  this  formulation  is  that  it  puts  the  errors  in  the 
solution  process  on  the  same  footing  as  the  other  three  parts  of  the  FEM 
method  which  are  afflicted  by  roundoff  and  were  mentioned  in  the  introduction. 

Each  of  these  effectively  makes  a perturbation  of  K,  of  f,  or  of  both. 

Now  it  is  reasonable  to  ask,  for  example,  whether  the  errors  made  in  assembling 
the  stiffness  matrix  are  more  damaging  than  those  made  in  reducing  the  equa- 
tions to  triangular  form  or  less  so. 

4 . Implications  of  the  Equivalent  Perturbation  Formulation 
The  stiffness  matrix  K is  symmetric  positive  definite  and  of  bandwidth  2w+l. 
Not  all  zero  elements  within  the  band  will  be  filled  in  during  the  reduction 
process.  Various  inferences  can  be  drawn  from  the  form  of  the  expressions 
(12)  and  (16)  for  the  elements  of  E. 

A.  Each  element  e . . within  the  band  is  a sum  of  at  most  2(w-j+i) 

* J 

roundoff  errors.  So  much  for  the  accumulation  effect  which  was  so  dreaded 


12 


! 


in  the  1940's. 

B.  The  Hilbert  segment  phenomenon  is  explained  as  follows.  It  happens 
that  all  the  elements  decay  during  elimination.  So  the  sum  in  (17)  is  domi- 

ince 


nated  by  the  first  term  (m  = 1);  je^  .J  = el*-nui j + I = ek.. .,  sil 
(2) 

ij  = *S'j  ~ ^i i ui  j * is  S1’ze  of  the  error  incurred  when  rounding 


numbers  to  working  precision. 

C.  In  general,  whenever  the  (i,j)  element  shrinks  below  its  original 


value  and  remains  below  it  then  e..  is  of  the  order  of  a unit  in  the  last 

1 J 


place  of  k. .. 

U 


D.  The  elements  £.  and  u . (=  d Z.  ) do  not  occur  separately  in 

im  mj  m jm 


the  analysis.  In  the  absence  of  overflow  or  underflow  d^  (a  pivot)  may 


be  tiny  or  (a  multiplier)  may  be  huge.  Only  their  product  counts  and. 


in  the  positive  definite  case, 


1 SmVj  ^ ^ ki  i kj  j ^ 


1/2 


< max  k. 


n 


E.  Bunch  [ ] has  given  exact,  but  complicated  expressions  for  the 

number  of  nonzero  terms  in  the  sum  in  (12),  (18),  and  (19),  in  terms  of  the 
number  of  nonzeros  in  each  row  of  K.  Our  cruder  expression  (w-j+i)  is 
not  ridiculously  pessimistic,  see  H. 

F.  For  full  positive  definite  matrices  in  which  the  reduced  matrices 


,(j) 


remain  positive  definite  Wilkinson  [11]  shows 


DEI)  < 2.5  n3/2e»Kll 


for  the  spectral  norm,  HMII  = Such  results  throw  away  information 

in  the  sense  that  there  are  many  matrices  E which  satisfy  this  inequality 
but  could  not  possibly  be  equivalent  perturbations  accounting  for  errors  in 
triangular  factorization.  The  reason  for  being  content  with  such  norm  bounds 
is  given  in  G below. 

G.  [ quivalont  perturbations  <SK  are  very  useful  for  specialists  who 
wish  to  compare  methods.  However  the  user  wishes  to  know  the  effect  of  these 


-«■  ... 


A 
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changes  on  the  solution.  Perturbation  Theory  (see  [4])  yields 


Theorem.  Let  Ax  = b,  (A+<5A)(x+6x)  = b+6b.  JMF  A and  A + 5A  are 
invertible  and  BA-1 6AB  < 1/2  then 


.-1 


where  k(A)  = HA11  * 11  A~  1 is  the  condition  number  of  A in  the  given  norm. 


Note  that  this  is  not  simply  an  asymptotic  result  holding  only  for 
infinitessimal  6A  and  6b. 

Since  II6KII/IIKII  and  ||6bll/||bl  are  small,  provided  that  n2e  < 1,  the 
computed  solution  must  be  accurate  unless  k(A)  is  large.  In  the  FEM  case 
much  better  bounds  than  Wilkinson's  general  one  can  be  given. 

H.  For  positive  definite  matrices 


£.  u . 
1 im  mj 


ik(m),.(m),1.(m)|  /k(m).  (m).l/2  . J/2 

|Kim  Kmj  /Kmm  1 - lKii  kjj  > - ^kiiKjj; 

k(m+l),  ,k(m+l) .(m+lhl/2  ,.(m).  (mhl/2  ,.  . ,1/2 

Kij  1 - lKii  kjj  ’ - kjj  ; -(kiijj; 


A simpler  but  cruder  bound  than  (17)  on  e. . for  matrices  of  bandwidth  2w+l 

i.l 

i s 


2, 

; n i.i 


(vi-jH ) (k . . 1 . . ) 

ii  j J 


Thus  e^  is  always  tiny  compared  with  the  geometric  mean  of  the  associated 
diagonal  elements.  In  fact  E can  be  majorized,  element  by  element,  by  the 
special  matrix  W defined  by 


wU'“ii-{0 


(w-j+i)(k.ikj.j.)1/2  , 


j-i  £ w , 
j-i  > 0 . 


t 
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2 

Hence  II E H < 2eRWll  < 2ew  max  k...  using  the  spectral  norm. 

i 11 

I.  In  one  important  respect,  which  is  rarely  mentioned,  the  equivalent 
perturbation  6K  from  the  solution  process  is  not  similar  to  the  perturbations 
corresponding  to  numerical  integration,  formation  of  the  global  stiffness 
matrix,  etc.  The  other  sources  all  give  small  relative  perturbations  of  the 
elements.  With,  fill-in  during  the  elimination  we  have  e..'s  which  are  tiny 

• J 

relative  to  I1KII  but  enormous,  or  »,  compared  to  k...  As  shown  in  G these 

• J 

fill-in  errors  do  not  impair  n 6u II / R u II  but  some  of  them  do  not  have  a rea- 
sonable FEM  interpretation.  They  correspond  to  wiggles  in  the  trial  functions 
far  outside  the  elements  to  which  they  belong. 

There  is  a device  for  removing  this  anomaly.  6K  is  split  into  a sum 
5S+6T  where  |(5S)^|  <_ew|k^|,  say,  and  |(6T)^.|  >ew|k^|.  The  equi- 
valent perturbation  relation  can  now  be  rewritten 

(K  + <SS)u  = f+Sf-STu  . 

All  that  remains  is  to  attribute  ( <ST u ) ^ to  one  or  several  of  the  larger 
elements  among  the  |ki|r)uj  or  to  f..  itself.  For  example  if  ^Tu)^  is 
less  than  the  uncertainty  in  f^ , for  each  i,  then  6Tu  is  best  regarded 
as  a further  perturbation  of  f. 

J.  Often,  but  not  always,  the  6K  due  to  errors  in  the  solution  pro- 
cess are  less  than  the  inherent  uncertainties  in  the  k..  or,  more  likely, 

■ J 

in  f.  In  such  cases  the  computed  u is  as  good  as  the  data  warrants  and 
the  only  legitimate  question  concerns  the  inherent  sensitivity  of  the  solu- 
tions to  perturbations  in  K from  any  source. 
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5.  Estimating  the  Condition  Number 

Current  programs  for  solving  general  nonsingular  systems  do  not  deliver 
any  estimates  of  the  accuracy  of  the  computed  solution.  A major  reason  for 
this  is  that,  at  present,  the  cost  of  estimating  the  condition  number  is  at 
least  equal  to  the  cost  of  the  solution,  either  in  time  or  in  storage  or  both. 
This  is  a price  which  users  appear  to  be  unwilling  to  pay.  No  one  knows  how 
far  the  cost  of  estimating  k can  be  reduced. 

For  FEM  problems  Fried  [5,6]  have  given  very  nice,  almost  computable, 
a priori  bounds.  In  particular,  for  energies  involving  m^  derivatives,  as 
h - 0, 


< = 0(h"2m) 


where  h is  the  minimal  diameter  of  an  element. 

Here  we  mention  some  a posteriori  techniques,  each  of  which  has  its 
drawbacks.  By  convention  we  use  the  spectral  norm.  Roundoff  will  be  ignored 
here. 

At  the  completion  of  the  calculation  there  is  available 
(i)  a such  that  IKS  = A (K)  < a < (2w+l)maxk.. 

iTldX  j II 

(ii)  L,  D such  that  LDLT  = K 

The  bandwidth  of  L is  w+1,  I = (e,,e»,...,e  ). 

i c n 

A.  Inverse  Iteration  (a  lower  bound  on  tc).  Solve  Lw  = e,  e^  = 1 1, 

and  Uw  = w.  The  sign  is  chosen  to  maximize  |w.|  for  each  j.  Then 

J 

IK"1!  > Ivl/flell  * llv||//n  and  k(K)  >_  (max  k^. ) B vB /w^n . 

Cost:  2nw  multiplications,  1 n-vector  for  w and  v. 

B.  Majorizing  L~^  (an  upper  bound  on  <).  Suppose  that  U = 0LT. 

-1  -1  -1  2 

Then  l!K  II  < IlD  H*IL  ! . The  following  observation  [8]  yields  an  inex- 
pensive  bound  on  IL"'li  = max  low  sum  of  L . Given  any  lower  triangular 
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matrix  T define  a new  lower  triangular  matrix  f by 

S‘i  = llii  I ’ *ij  = " I ti j I • 1 * J • 

It  can  be  shown  that  T ^ is  nonnegative  and,  element  for  element 
T ^ < T So  let  e = (1,1,..., 1)^  and  solve  Ly  = e for  y.  Then 

m-'ViiL'V*  lyico 

<JK)  - IKBJK^Lla-lyli/min  Ujj 

<3 

Cost:  nw  multiplications,  1 n-vector  for  y. 

Karasalo  [8]  points  out  that  for  a full  triangular  matrix  these  bounds 
can  sometimes  be  pessimistic  by  a factor  of  2n"\  For  banded  LU  factors 
the  bound  cannot  be  that  crude.  Anderson  [1]  offers  more  complicated  but 
sometimes  better  bounds  based  on  the  same  idea. 


C.  Iterating  on  the  Residuals  [4,  p.  109].  This  process  was  invented 
for  improving  the  accuracy  of  computed  solutions  to  ill-conditioned  systems. 
If  the  matrix  is  too  ill-conditioned  for  Gaussian  elimination,  with  the  given 
precision  to  arithmetic,  to  be  meaningful  then  the  sequence  {x.}  will 
almost  certainly  not  settle  down  at  all.  That  information  is  worth  having 
and  the  process  can  be  stopped  after  four  or  five  steps.  Now  suppose  that 
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convergence  does  occur.  It  is  linear  and  the  convergence  factor  depends  on 
the  condition  number.  If  convergence  is  slow  (6  or  7 steps)  then  k must 
be  large  and  can  be  estimated  from 

<(A)  t 2tflx3-x2B/Bx2-x1  II 

where  x2,  x3  are  the  three  previous  iterates,  x3  being  the  latest. 

If  convergence  is  immediate  (1  or  2 steps)  then  < may  or  may  not  be 
large. 

Cost:  (2v/+l)n  double  precision  operations  per  iteration.  However 

K and  x2  must  be  saved. 

D-  Diagonal  Energy  Criterion  [7],  The  user  is  not  content  to  know  E, 
even  in  principle,  but  needs  to  know  the  effect  of  E upon  his  system.  One 
measure  is  the  relative  change  in  the  strain  energy  for  a displacement  vector 
x,  i.e. 

x^Ex/x^Kx  . 


From  ( H)  in  Section  4 we  have 
Schwarz  Inequality  we  have 


leijl  1 2ew/k.7k_.  Using  the  Cauchy- 


(21) 


I xTEx! 


n n 


i2ew^lj^1A^j7xixj = n^xij 

n o r 

< 2ew  l k..x‘  = 2ewx  K .x  . 

- n i d 


By  using  standard  statistical  estimates  for  the  expected  value  of  e. 


(m) 


ij 


Irons  replaces  w by  SH.  He  also  suggests  that  wk..  can  be  replaced  by 

(m)2ll/2 

I k,-.:  where  the  sum  extends  over  those  m for  which  k..  changes. 

Hn  1 n 

However  this  estimate  is  relevant  only  to  perturbations  E attributable  to 
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Gaussian  elimination  whereas  (21)  covers  all  perturbations  satisfying 
I eij- 1 £ 2ew|kij|.  For  example,  it  would  be  appropriate  to  replace  2w  by  p 
to  account  for  the  error  in  assembling  the  global  stiffness  matrix.  Here  p 
is  the  maximum  number  of  elements  meeting  at  a node. 

When  K is  equilibrated  so  that  = I and  when  x is  an  eigenvector 
for  K's  smallest  eigenvalue  then  Irons'  bound  is  cv'w/A^  which  is  a 
very  good  estimate  of  ec(K)  where  c is  the  optimal  condition  number  of  K. 
In  practice  we  have  u,  the  computed  solution,  and  the  estimate  becomes 
evW  u^K^u/u^f. 

Cost:  3n  operations,  1 vector  to  save  f. 

6.  Optimal  Condition  Numbers  are  Irrelevant 

A well  known  result,  due  to  Bauer  [2],  establishes  that  Gaussian 
elimination,  roundoff  errors  included,  is  invariant  under  scaling.  More 
precisely,  if  the  sequence  of  pivotal  elements  is  fixed  and  if  the  scaling 
is  done  solely  with  the  powers  of  the  number  base  of  the  arithmetic  unit, 
then  the  fractional  parts  of  the  all  the  floating  point  numbers  which  occur 
in  the  algorithm  are  unchanged  if  the  calculation  is  repeated  on  a scaled 
system. 

A convenient  practical  consequence  of  this  fact  is  that,  if  one  is  pre- 
pared to  risk  that  no  under/overflow  occurs,  then  there  is  no  need  to  bother 
with  scaling  positive  definite  matrices,  A more  subtle  corollary  is  that 
discussion  of  elimination  should  employ  scaling  invariant  terms.  Now  the 
condition  number  with  respect  to  a given  norm, 

k(A)  = IAI-IA"1!  , 

is  certainly  not  invariant.  The  nearest  qualified  candidate  for  a measure 
of  near  singularity  is  the  optimal  condition  number 
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c(A)  = min  k(DAD) 

D 

over  all  positive  diagonal  matrices. 

This  is  a beautiful  example  of  solving  a problem  by  fiat!  In  fact  it 
can  be  argued  that  c is  simply  the  measure  which  makes  the  results  look 
their  best.  There  is  an  alternative  approach. 

For  general  linear  systems,  where  it  does  influence  pivot  selection, 
scaling  is  regarded  as  a vexing  and  difficult  problem.  This  is  because  the 
difficulty  is  not  mathematical  and  cannot  be  resolved  without  more  information. 
The  task  of  solving  a linear  system  in  noisy  arithmetic  is  not  properly  set 
until  the  user  specifies  a suitable  norm  for  measuring  the  right  hand  side 
(the  load  vector)  and  a suitable,  possibly  different,  norm  for  measuring  the 
solution  (the  displacements).  Ideally  these  norms  should  reflect  the  users 
interest  in  the  problem  and  they  should  have  the  property  that  vectors  of 
equal  norm  should  represent  states  that  have  equal  impact  on  the  user. 

Here  lies  the  rub.  Most  users  are  not  in  the  habit  of  formulating  their 
knowledge  of  their  problem  in  this  formal  manner.  Nevertheless  such  speci- 
fications do  resolve  the  difficulties. 

Let  B • II d and  1*11^  be  the  given  norms  for  displacements  and  loads, 
respectively.  The  corresponding  norm  for  a stiffness  matrix  K is  then 
given  by 

1KI j 5 max  »KuB0/«uB. 
d u*0  £ d 

and  the  correct  norm  of  a flexibility  matrix  Kf^  = F is 

|FId  = max  IFvl  ./lvl0  . 
v*0  0 11 


Finally  the  sensitivity  of  the  model  must  be  measured  by 


20 


<dit(K)  ■ IK'J'K-'lJ  . 

This  measure  is  also  scaling  invariant  by  decree  but  in  a more  relevant  way 
than  is  c.  If  K were  to  be  scaled  (in  fact  there  is  no  point  in  doing  so) 
this  would  correspond  to  a scaling  of  loads  and  displacements.  This  is  fine 
but  it  would  be  meaningless  not  to  change  the  norms  in  a corresponding  way. 
When  this  is  done  properly  the  new  condition  number  turns  out  to  be  Identical 
to  the  old  condition  number,  as  it  should  if  this  measures  the  nearness  to 
singularity  of  K for  the  user*s  purpose. 

The  formal  verification  of  these  remarks  is  left  as  an  exercise  for  the 
interested  reader. 

Although  this  viewpoint  clears  up  the  difficulty  of  choosing  the  right 
condition  number  it  is  very  much  a pure  mathematician's  solution.  The  proper 
matrix  norms  are  well  defined  but  how  are  they  to  be  constructed?  We  have 
no  answer  to  that  question. 

Another  reasonable  scaling  is  to  choose  0 so  that  the  absolute  uncer- 
tainty in  each  element  of  DKD  or  of  Df  is  constant. 
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